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Synthetic elements for enhancing expression of genes in plant cells arc disclosed. These include a promoter with a 'TATA to start" 
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a high level under the transcriptional control of a recombinant promoter having at least one upstream activating region of the 35S CaMV 
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SYNTHETIC PROMOTERS 

RFT.ATF.n APPLICATIONS 
This application is a continuation-in-part of copending U.S. Patent Application Serial No. 

08/661,601, filed on June 1 1, 1996, herein incorporated by reference, 

5 FTRT.D OF THE rNVENTIQN 

This invention relates generally to the field of plant molecular biology and in particular 
to enhanced expression of desired structural genes in both monocotyledonous and dicotyledonous 
plants. 

10 PA<^KG ROUND OF THE INVENTION 

Gene expression encompasses a number of steps originating from the DNA template 
ultimately to the final protein or protein product. Control and regulation of gene expression can 
occur through numerous mechanisms. The initiation of transcription of a gene is generally 
thought of as the predominant control of gene expression. The transcriptional controls (or 

15 promoters) are generally relegated to relatively short sequences imbedded in the 5'- flanking or 
upstream region of the transcribed gene. There are DNA sequences which affect gene expression 
in response to envux)nmental stimuli, nutrient availability, or adverse conditions including heat 
shock, anaerobiosis or the presence of heavy metals. There are also DNA sequences which 
control gene expression during development or in a tissue, or organ specific fashion. 

20 Promoters contain the signals for KNA polymerase to begin transcription so that protein 

synthesis can proceed. DNA binding, nuclear proteins interact specifically with these cognate 
promoter DNA sequences to promote the formation of the transcriptional complex and eventtially 
initiate the gene expression process. 

One of the most common sequence motifs present in the promoters of genes transcribed 

25 by eukaiyotic RNA polymerase II (polll) system is die "TATA" element which resides upstream 



wo 99/43838 PCT/US99/03863 

-2- 

of the start of transcription. Eukaryotic promoters are complex and are comprised of components 
which include a TATA box consensus sequence at about 35 base pairs 5' relative to the 
transcription start site or cap site which is defined as +1 . The TATA motif is the site where the 
TATA-binding-protein (TBP) as part of a complex of several polypeptides (TFIID complex) 
binds and productively interacts (directly or indirectly) with factors bound to other sequence 
elements of the promoter. This TFIID complex in turn recruits the RNA polymerase n complex 
to be positioned for the start of transcription generally 25 to 30 base pairs downstream of the 
TATA element and promotes elongation thus producing RNA molecules. The sequences around 
the start of transcription (designated INR) of some polll genes seem to provide an alternate 
binding site for factors that also recruit members of the TFIID complex and thus "activate" 
transcription. These INR sequences are particularly relevant in promoters that lack functional 
TATA elements providing the core promoter binding sites for eventual transcription. It has been 
proposed that promoters containing both a functional TATA and INR motif are the most efficient 
in transcriptional activity. (Zenzie-Gregory et al (1992) J. Biol Chem. 267:2823-2830). 

In most instances sequence elements other than the TATA motif are required for accurate 
transcription. Such elements are often located upstream of the TATA motif and a subset may 
have homology to the consensus sequence CCAAT. 

Other DNA sequences have been found to elevate the overall level of expression of the 
nearby genes. One of the more common elements that have been described reside far upstream 
from the initiation site and seem to exhibit position and orientation independent characteristics. 
These far upstream elements have been designated enhancers. 

One of the less common elements by virtue of their specificities are sequences that 
interact with specific DNA binding factors. These sequence motifs are collectively known as 
upstream elements which are usually position and orientation dependent. 

Many upstream elements have been identified in a number of plant promoters based 
initially on fimction and secondarily on sequence homologies. These promoter upstream 
elements range widely in type of control: fixim environmental responses like temperature, 
moisture, wounding, etc., developmental cues, (germination, seed maturation, flowering, etc.) 
to spatial mformation (tissue specificity). These elements also seem to exhibit modularity in that 
they may be exchanged with other elements while maintaining their characteristic control over 
gene expression. 



SUBSTiTUTE SHEET (RULE 26) 
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Promoters are usually positioned 5' or upstream relative to the start of the coding region 
of the corresponding gene, and the entire region containing all the ancillary elements affecting 
regulation or absolute levels of transcription may be comprised of less than 100 base pairs or as 
much as I kilobase pair. 

5 A number of promoters which are active in plant cells have been described in the 

literature. These include nopaline synthase (NOS) and octopine synthase (OCS) promoters 
(which are carried on tumor inducing plasmids of Agrobacterium tumefaciens). The cauliflower 
mosaic virus (CaMV) 19S and 35S promoters, the light-inducible promoter from the small 
subunit of ribulose bisphosphate carboxylase (ssRUBICSO, a very abundant plant polypeptide), 

10 and the sucrose synthase promoter are also included. All of these promoters have been used to 
create various types of DNA constructs which have been expressed in plants. (See for example 
PCT publication WO84/02913 Rogers, et al). 

Two promoters that have been widely used in plant cell transformations are those of the 
genes encoding alcohol dehydrogenase, AdhI and AdhIL Both genes are induced after the onset 

15 of anaerobiosis. Maize AdhI has been cloned and sequenced as has been Adhll. Formation of 
an AdhI chimeric gene, Adh-Cat comprising the AdhI promoter links to the chloramphenicol 
acetyltransferase (CAT) coding sequences and nopaline synthase (NOS) 3' signal caused CAT 
expression at approximately 4-fold higher levels at low oxygen concentrations than under control 
conditions. Sequence elements necessary for anaerobic induction of the ADH-CAT chimeric 

20 have also been identified. The existence of anaerobic regulatory element (ARE) between 
positions -140 and -99 of the maize AdhI promoter composed of at least two sequence elements 
positions -133 to -124 and positions -1 13 to 99 both of which have found to be necessary and are 
sufficient for low oxygen expression of ADH-CAT gene activity. The Adh promoter however 
responds to anaerobiosis and is not a constitutive promoter drastically limiting its effectiveness. 

25 Another commonly used promoter is the 35S promoter of Cauliflower Mosaic Virus. The 

(CaMV) 35S promoter -is a dicot virus promoter however it directs expression of genes 
^ introduced into protoplasts of both dicots and monocots. The 35S promoter is a very strong 
promoter and this accounts for its widespread use for high level expression of traits in transgenic 
plants. The CaMV35S promoter however has also demonstrated relatively low activity in several 

30 agriculturally significant graminaceous plants such as wheat. While these promoters all give high 
expression in dicots, few give high levels of expression in monocots. A need exists for a 
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synthetic promoters and other elements that induce expression in transformed monocot protoplast 
cells. 

qTT MMARY OF T HF. INVENTION 

5 Methods and compositions for the expression of heterologous sequences in host cells are 

provided. The compositions find particular use in controlling the expression of sequences in 
plants. The compositions ofthe invention comprise promoter sequences. In particular, a novel 
synthetic core promoter molecule and regulatory elements useful in controlling expression in 
target cells are provided. The core promoter comprises a TATA box and a start of transcription. 

10 Further, the "TATA to start" region is 64% or greater GC rich. The regulator/ elements include 
a novel upstream element and upstream activating regions. The upstream activating region is 
different from the synthetic upstream element. The elements can be used together or with other 
promoter elements to control expression of sequences of interest. 

It is a primary object of the invention to provide synthetic regulatory elements that 

15 enhance expression of introduced genes in plant cells and plant tissues. 

It is an object ofthe invention to provide a recombinant promoter molecule that provides 
for reliably high levels of expression of introduced genes in target cells. It is yet another object 
ofthe invention to provide heterologous upstream enhancer elements that can enhance the 

activity of any promoter. 
20 It is yet another object of the invention to provide plants, plant cells and plant tissues 

containing either or both ofthe recombinant promoter or upstream element ofthe invention. 

It is yet another object ofthe invention to provide vehicles for transformation of plant 
cells including viral or plasmid vectors and expression cassettes incorporating the synthetic 
promoter and upstream elements ofthe invention. 
25 It is yet another object ofthe invention to provide bacterial cells comprising such vectors 

for maintenance, and plant transformation. 

Other objects ofthe invention will become apparent from the description ofthe invention 

which follows. 



30 
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DESCRIPTION OF THE FTGTIPFS 

Figure 1 is a depiction of a typical nucleotide base arrangement of a core promoter 
containing the consensus sequences of TATA and INR motifs present in plant promoters. A 
designates +1 of the transcribed region. 
5 Figure 2 is a depiction of the complete Syn II Core Promoter Sequence with an example 

of a plant promoter and both are aligned at the major start of transcription (bold letter). The 
TATA motif is underiined. The CaMV 35 S promoter is shown with percent GC content 
sequences shown in parentheses. 

Figure 3 is the DNA sequence of the Rsyn 7 upstream element. The TGACG motifs are 
10 indicated in bold. 

Figure 4 is a plasmid map of one embodiment of the invention comprising the Syn II Core 
promoter and Rsyn7 elements driving a GUS containing construct. 

Figure 5 depicts several schematics of synthetic promoters according to the present 
invention tested in transient and stable transformants. 
15 Figure 6 is a depiction of transient assay data using the plasmids incorporating the 

promoter sequences of the invention. 

Figure 7(A). Rsyn7::GUS (PHP6086) activity to TO maize plants. 
Figure 7(B) is a schematic of VT stage com plants with sites of tissue samples indicated. 
Figure 8 depicts GUS activity in root segments of a segregating population of maize Tl 
20 transgenic seedlings containing the Rsyn7:;GUS (PHP6086) or the UBI:GUS (PHP3953) 
construct. 

Figure 9 depicts GUS expression of three synthetic promoters in TO transgenic maize 
plants including the promoter sequences of the invention as comparison. 

Figure 10 shows the comparison of the activities of the Rsyn7 promoter, the CaMV 35S 
25 promoter, and the 35SU-Rsyn7 promoter in transient expression in sunflower cotyledons. 

Figure 1 1 shows the effect of the Uhi-1 upstream activating region on the strength of the 
Rsyn 7 promoter in transient expression in sunflower cotyledons. 

Figure 12 shows that in the stably transformed sunflower callus, GUS expression behind 
the control of tiie 35SU-Rsyn7 is 20% higher than when behind the control of the 35S CaMV 
30 promoter. 
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Figure 13 shows the effect of the Ubi-J upstream activating region on the activity of the 
Rsyn7 promoter in transgenic sunflower callus assay. 

DKTATT.FD DESCRIPTION OF THE INVENTION 
5 In the description that follows a number of terms are used extensively. The following 

definitions are provided in order to remove ambiguities in the intent or scope of their usage in 
the specification and claims, and to facilitate understanding of the invention. 

A structural gene is a DNA sequence that is transcribed into messenger RNA (mRNA) 
which is then translated into a sequence of amino acids characteristic of a specific polypeptide. 

10 A promoter is a DNA sequence that directs the transcription of a structural gene. 

Typically a promoter is located in die 5' region of a gene, proximal to the transcriptional start site 
of a structural gene. The promoter of the invention comprises at least a core promoter as defined 
below. Additionally, the promoter may also include at least one upstream elements. Such 
elements include UARs and optionally, other DNA sequences that affect transcription of a 

15 structural gene such as a synthetic upstream element. 

A core promoter or minimal promoter contains the essential nucleotide sequences for 
expression of the operably linked coding sequence, including the TATA box and start of 
transcription. By this definition, a core promoter may or may not have detectable activity in the 
absence of specific sequences that may enhance the activity or confer tissue specific activity. For 

20 example, the maize SGB6 gene core promoter consists of about 37 nucleotides 5' of the 
transcriptional start site of the SGB6 gene, while the Caiiliflower Mosaic Virus (CaMV) 35S core 
promoter consists of about 33 nucleotides 5' of the transcriptional start site of the 35 S genome. 

ADH refers generally to a plant expressible alcohol dehydrogenase gene and specifically 
to the alcohol dehydrogenase gene from maize. 

25 ADH 1 UAR refers to the DNA fragment spanning the region between nucleotide 

positions about -1094 to about -106 of the alcohol dehydrogenase gene I from maize, or 
homologous fragment that is functionally equivalent. The sequence is numbered with the start 
of transcription site designated as +1 according to the correction published by Ellis et al. (1987) 
supra. 

30 "TATA to start" shall mean the sequence between the primary TATA motif and the start 

of transcription. 
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A synthetic DNA is an artificially created DNA sequence that is not produced naturally, 
and must be introduced to an organism or to an ancestor of that organism to control or to be 
expressed. 

OCS element refers to the TGACG motif identified from the octopine synthase gene, 
5 histone genes, enzyme genes for agropine biosynthesis, the mannopine synthase gene, the CaMV 
35S gene, histone H3 gene and nopaline synthase gene. As used herein the term includes any 
sequence capable of binding the ASF-1 factor as identified in U.S. Patent No. 4,990,607 by 
Katagiri, the disclosure of which is incorporated by reference. 

UAR is typically a position or orientation dependent element that primarily directs tissue, 
10 cell type, or regulated expression. 

An enhancer is a DNA regulatory element that can increase efficiency of transcription 
regardless of the distance or orientation of the enhancer relative to the start site of transcription. 

The term expression refers to biosynthesis of a gene product. In the case of a structural 
gene, expression involves transcription of the structural gene into mRNA and then translation of 
15 the mRNA into one or more polypeptides. 

A cloning vector is a DNA molecule such as a plasmid, cosmid or bacterial phage that 
has the capability of replicating autonomously in a host cell. Cloning vectors typically contain 
one or a small number of restriction endonuclease recognition sites at which foreign DNA 
sequences can be inserted in a determinable fashion without loss of essential biological function 
20 of the vector, as well as a marker gene that is suitable for use in the identification and selection 
of cells transformed with the cloning vector. Marker genes typically include genes that provide 
tetracycline resistance, hygromycin resistance or ampicillin resistance. 

An expression vector is a DNA molecule comprising a gene that is expressed in a host 
cell. Typically gene expression is placed under the control of certain regulatory elements 
25 including promoters, tissue specific regulatory elements, and enhancers. Such a gene is said to 
be "operably linked to" the regulatory elements. 

A recombinant host may be any prokaryotic or eukaryotic cell that contains either a 
cloning vector or an expression vector. This term also includes those prokaryotic or eukaryotic 
cells that have been genetically engineered to contain the cloned genes in the chromosome or 
30 genome of the host cell. 
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A transgenic plant is a plant having one or more plant cells that contain an expression 

vector. 

It will be understood that there may be minor sequence variations within sequence or 
fragments used or disclosed in this application. By "minor variations" is intended that the 
5 sequences have at least 80%. preferably 90% sequence identity. These variations may be 
determined by standard techniques to enable those of ordinary skill in the art to manipulate and 
bring into utility the functional units of the promoter elements necessary to direct initiation of 
transcription in the structtiral gene followed by a plant expressible transcription termination (and 
perhaps polyadenylation) signal. 
10 Plant tissue includes differentiated and undifferentiated tissues or plants, including but 

not limited to roots, stems, shoots, leaves, pollen, seeds, tumor tissue and various forms of cells 
and culmre such as single cells, protoplast, embryos, and callus tissue. The plant tissue may be 
in plants or in organ, tissue or cell culture. 

One embodiment of the invention, the core promoter, is shown in SEQ ID N0:1. The 
15 core promoter is capable of driving expression of a coding sequence in a target cell, particularly 
plant cells. The core promoter finds use in driving expression of sequences which are only 
needed at minimal levels in the target cells. Also disclosed is a novel upstream element, SEQ 
ID N0:2 that helps to potentiate transcription. The synthetic core promoter can be used with 
combinations of enhancer, upstream elements, and/or activating sequences from the 5'-flanking 
20 regions of plant expressible structural genes. Similarly the upstream element can be used in 
combination with various plant core promoter sequences. In one embodiment the core promoter 
and upstream element are used together to obtain ten-fold higher expression of an introduced 
marker gene in monocot transgenic plants than is obtained with the maize ubiquitin 1 promoter. 
The core promoter comprises a TATA motif and a GC rich "TATA to start of 
25 transcription" region (64% or greater GC content that is generally characteristic of animal 
promoters. The sequence is placed 5* of a structural gene and will promote constitutive 
expression which is non-tissue specific in transgenic plant cells. 

The invention also comprises an expression cassette comprising (the upstream element) 
the synthetic core promoter, a structural gene, the expression of which is desired in plant cells, 
30 and a polyadenylation or stop signal. The expression cassette can be encompassed in plasmid 
or viral vectors for transformation of plant protoplast cells. 
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The invention also encompasses transformed bacterial cells for maintenance and 
replication of the vector, as well as transformed monocot or dicot cells and ultimately transgenic 
plants. 

In another embodiment, the invention encompasses an upstream element that can be used 
5 in combination with the synthetic promoter or with other known promoters in the art. The 
upstream element comprises at least 3 OCS binding motifs (TGACG) witii a novel intervening 
sequence. One embodiment is disclosed in SEQ ID N0:2 and is placed 5' to a core promoter 
sequence to enhance the transcription levels of the resulting gene product. Thus the invention 
comprises an expression cassette comprising the synthetic upstream element of the invention, 5' 

10 to a plant inducible promoter which is 5' to a structural gene. This expression cassette can be 
embodied in vectors and plasmids as earlier described. 

In a preferred embodiment the synthetic upstream element is used in combination with 
the synthetic core promoter sequence to achieve non-tissue specific constitutive expression of the 
gene product which is a ten-fold enhancement of the maize Ubi~l promoter. 

15 The present invention also encompasses a promoter construct comprising the synthetic 

core promoter described above and an upstream activating region. The upstream activating 
region is different fi-om the synthetic upstream element. Preferably the upstream activating 
region is an upstream activating region (UAR) having substantial sequence similarity to the UAR 
of CaMV 35S or maize Ubi-L Promoter constructs of the invention may comprise the synthetic 

20 „ . core promoter in combination with at least one UAR and optionally at least one synthetic 
upstream element. 

The promoter construct can be contained for convenience in an expression cassette. This 
expression cassette can be embodied in transformation vectors. 

The sequence of the upstream activating region (UAR) of the maize Ubi-1 gene is also 
25 provided. This UAR can be used m combination with any core promoter to enhance the activity 
of the promoter. 

The promoter of the invention as seen in SEQ ID NO: 1 and/or SEQ ID NO: 1 0 (modified 
core promoter), can be used to obtain high levels of expression of structural genes. Similarly the 
upstream element of the invention (SEQ ID N0:2) can be used in combination with other 
30 promoters or the promoter of the invention to potentiate levels of transcription in genetically 
modified plants. Production of a genetically modified plant tissue expressing a structural gene 
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under the control of the regulatory elements of the invention combines teachings of the present 
disclosure with a variety of techniques and expedients known in the art. In most instances 
alternate expedients exist for each stage of the overall process. The choice of expedients depends 
on the variables such as the plasmid vector system chosen for the cloning and introduction of the 
5 recombinant DNA molecule, the plant species to be modified, the particular structural gene, 
promoter elements and upstream elements used. Persons skilled in the art are able to select and 
use appropriate alternatives to achieve functionality. Culture conditions for expressing desired 
structural genes and cultured cells are known in the art. Also as known in the art, a number of 
both monocotyledonous and dicotyledonous plant species are transformable and regenerable such 

10 that whole plants containing and expressing desired genes under regulatory control of the 
promoter molecules and upstream elements of the invention may be obtained. As is known to 
those of skill in the art, expression in transformed plants may be tissue specific and/or specific 
to certain developmental stages or environmental influences. Truncated promoter selection and 
structural gene selection are other parameters which may be optimized to achieve desired plant 

15 expression as is known to those of skill in the art and taught herein. 

The nucleotide sequences of the invention can be introduced into any plant. The genes 
to be introduced can be conveniently used in expression cassettes for introduction and expression 
in any plant of interest. 

Such expression cassettes will comprise the transcriptional initiation region of the 

20 invention linked to a nucleotide sequence of interest. Such an expression cassette is provided 
with a plurality of restriction sites for insertion of the gene of interest to be under the 
transcriptional regulation of the regulatory regions. The expression cassette may additionally 
contain selectable marker genes. 

The transcriptional cassette will include in the 5'-3' direction of transcription, a 

25 transcriptional and translational initiation region, a DNA sequence of interest, and a 
transcriptional and translational termination region functional in plants. The termination region 
may be native with the transcriptional initiation region, may be native with the DNA sequence 
of interest, or may be derived fi*om another source. Convenient termination regions are available 
from the Ti-plasmid of ^. tumefacierts, such as the octopine synthase and nopaline synthase 

30 teimination regions. See also, Guerineau era/., (1991) A/o/. Gen. Genet. 262:141-144; Proudfoot 
(1991) Cell 64:671-674; Sanfacon et al (1991) Genes Dev. 5:141-149; Mogen et al (1990) Plant 
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Ce// 2:1261-1272; Munroee/fl/. (1990) Gewe 91:151-158; Dallas eM/. m9) Nucleic Acids Res. 
17:7891-7903; Joshi et al (1987) Nucleic Acid Res. 15:9627-9639. 

The genes of the invention are provided in expression cassettes for expression in the plant 
of interest. The cassette will include 5' and 3* regulatory sequences operably linked to the gene 
5 of interest. The cassette may additionally contain at least one additional gene to be 
cotransformed into the organism. Alternatively, the additional gene(s) can be provided on 
another expression cassette. Where appropriate, the gene(s) may be optimized for increased 
expression in the transformed plant. That is, the genes can be synthesized using plant preferred 
codons for improved expression. Methods are available in the art for synthesizing plant preferred 

10 genes. See, for example, U.S. Patent Nos. 5,380,831, 5,436, 391, and Murray et al (1989) 
Nucleic Acids Res. 17:477-498, herein incorporated by reference. 

Additional sequence modifications are known to enhance gene expression in a cellular 
host. These include elimination of sequences encoding spurious polyadenylation signals, exon- 
intron splice site signals, transposon-like repeats, and other such well-characterized sequences 

15 which may be deleterious to gene expression. The G-C content of the sequence may be adjusted 
to levels average for a given cellular host, as calculated by reference to known genes expressed 
in the host cell. When possible, the sequence is modified to avoid predicted hairpin secondary 
mRNA structures. 

The selection of an appropriate expression vector will depend upon the method of 
20 introducing the expression vector into host cells. Typically an expression vector contains (1) 
prokaiyotic DNA elements coding for a bacterial replication origin and an antibiotic resistance 
gene to provide for the amplification and selection of the expression vector in a bacterial host; 
(2) DNA elements that control initiation of transcription such as a promoter; (3) DNA elements 
that control the processing of transcripts such as introns, transcription 
25 termination/polyadenylation sequence; and (4) a reporter gene that is operatively linked to the 
DNA elements to control transcription initiation. Useful reporter genes include P- glucuronidase, 
P-galactosidase, chloramphenicol acetyl transferase, luciferase, green fluorescent protein (GFP) 
and the like. Preferably the reporter gene is either p-glucuronidase (GUS), GFP or luciferase. 
The general descriptions of plant expression vectors and reporter genes can be found in Gruber, 
30 et al., "Vectors for Plant Transformation, in Methods in Plant Molecular Biology & 
Biotechnology" in Glich et al,, (Eds. pp. 89-1 19, CRC Press, 1993). Moreover GUS expression 
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vectors and GUS gene cassettes are available from Clonetech Laboratories, Inc., Palo Alto, 
California while luciferase expression vectors and luciferase gene cassettes are available from 
Promega Corp. (Madison, Wisconsin). 

Expression vectors containing genomic or synthetic fragments can be introduced into 
5 protoplasts or into intact tissues or isolated cells. Preferably expression vectors are introduced 
into intact tissue. General methods of culturing plant tissues are provided for example by Maki 
et al. "Procedures for Introducing Foreign DNA into Plants" in Methods in Plant Molecular 
Biology & Biotechnology, Glich et al. (Eds. pp. 67-88 CRC Press, 1993); and by Phillips et al. 
"Cell-Tissue Culture and In- Vitro Manipulation" in Com & Com Improvement, 3rd Edition 
10 Sprague et al. (Eds. pp. 345-387) American Society of Agronomy Inc. et al. 1988. 

Methods of Introducing expression vectors into plant tissue include the direct infection 
or co-cultivation of plant cell vnth Agrobacterium tumefaciens, Horsch et al.. Science, 227: 1229 
(1 985). Descriptions of Agrobacterium vector systems and methods for Agrobacterium-mediated 
gene transfer provided by Gruber, et al. supra. 
15 Preferably, expression vectors are introduced into maize or other plant tissues using a 

direct gene transfer method such as microprojectile-mediated delivery, DNA injection, 
eiectroporation and the like. More preferably expression vectors are introduced into plant tissues 
using the microprojectile media delivery with the biolistic device. See, for example. Tomes et 
al "Direct DNA transfer into intact plant cells via microprojectile bombardment" In: Gamborg 
20 and Phillips (Eds.) Plant Cell, Tissue and Organ Culture: Fundamental Methods, Springer- 
Verlag, Berlin (1995). 

The vectors of the invention can not only be used for expression of structural genes but 
may also be used in exon-trap cloning, or promoter trap procedures to detect differential gene 
expression in varieties of tissues, K. Lindsey et ai., 1993 "Tagging Genomic Sequences That 
25 Direct Transgene Expression by Activation of a Promoter Trap in Plants", Transgenic Research 
2:33-47. D. Auch & Reth, et al., "Exon Trap Cloning: Using PGR to Rapidly Detect and Clone 
Exons from Genomic DNA Fragments", Nucleic Acids Research, Vol. 18, No. 22, p. 6743. 

This inventive promoter is based in part on the discovery that a GC rich "TATA to start" 
region in a plant promoter acts as a very strong nontissue specific core promoter inducing 
30 constitutive expression in plant cells. The TATA element of plant promoters of polll genes 
generally have the sequence TATA(A/T)A(AA^A, SEQ ID N0:3, whereas the consensus of the 
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start of transcription consists of the sequence 5'...TYYTCAT(A/C)AA.,3'. SEQ ID N0.:3, where 
the A designates the starting base for transcription. The typical plant promoter sequence is 
depicted in Figure I. 

Sequences intervening the TATA element and the start of transcription have been shown 
5 to play a significant role in transcriptional activation efficiency. The TATA binding protein has 
been shown to interact with the minor groove of the double helix binding to the TATA motif 
bending it towards the major groove side (Kim, et al, 1993, Nature, 365:5 12-520). It thus follows 
that sequences downstream of the TATA motif that impact this finding will affect the efficiency 
of stable transcriptional complex formation and ultimately expression. Surveys of the "TATA 

10 to start" regions of plant promoters show a significantly higher level of AT-rich sequences 
leading to the potential of minor groove compression (Yaurawj et al Biological Abstracts Vol. 
47, Issue 8, Ref 144712, "Consensus Sequences for Plant Minimal Promoters" Annual Meeting 
of the American Society of Plant Physiologists, July 29-August 2, 1995, Plant Physiology 108 
[2 Supp.] 1995, 1 14). Generally animal promoters show a GC-rich "TATA to start" sequence 

15 that leads to a major groove compression suggesting that average plant and animal core promoter 
transcriptional complexes recognize and interact with a somewhat different TATA to start 
structure with the corresponding sequence difference. Quite surprisingly the applicant has found 
that a GC-rich animal t>'pe synthetic promoter works very well in plants. 

While the invention is not bound by any theory, it is possible that the AT-rich TATA 

20 motif present in a GC-rich sequence may "present itself more prominently to the TATA-binding 
complex by a sharp demarcation of the TATA motif that would internet more tightly with the 
TATA-binding complex. This would improve the start of transcription efficiency, by shifting 
the equilibria of binding to a more stabilized form, whereas the "non-bounded TATA" version, 
i.e. having a higher level of AT-Rich sequences flanking the TATA motif, the TATA-binding 

25 complex would potentially slide or smtter 5* or 3' to the start site and effectively reduce the 
efficiency of bmding ultimately reducing transcription. Little data regarding this region of plant 
promoters is available except crude deletions and some point mutations. The obvious design of 
a synthetic core promoter for plant expression would include the AT-rich "TATA to start" 
sequence based on surveys of known pplant promoters. However, based on the "bounded" 

30 mechanism, it is postulated by the mechanism of the invention that a more efficient core 
promoter is a result of a TATA motif imbedded in a GC-rich sequence. 
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Figure 2 depicts the Syn II Core promoter sequence, SEQ ID N0:1 of the invention with 
examples of plant core promoters aligned to the major start of transcription. Another example 
of a plant promoter 35S of CaMV (SEQ ID N0:4) are shown with percent GC-rich sequences 
shown at the right in parentheses. The Syn II Core sequence does not show any significant 
5 sequence homology to sequences in the public sequence databases. 

The synthetic Syn II Core promoter sequence shows a 64% GC-rich "TATA to start" 
sequence different from the overall 40% GC-rich sequence present in traditional plant promoters 
(CaMV35S for example). The naturally occurring and isolated UBI core promoter which 
potentiates very high levels of activity in monocots usually shows a 64% GC-rich 'TATA to 
10 start" sequence more similar to animal promoters. Such examples provided the impetus to design 
a high GC-rich "TATA to start" sequence for efficient transcription in opposition to the current 
dogma of plant core promoters. 

Thus the invention comprises a synthetic plant core promoter sequence comprising a 
TATA motif and a "TATA to start" region that is 64% GC-rich or greater. In a preferred 
15 embodiment, the promoter may include restriction endonuclease target sites for ease of cloning. 
In the most preferred embodiment, the sequence is that of SEQ ED N0:1 . As will be appreciated 
by those of skill in the art, several base transversions within SEQ ID N0:1 may occur which will 
maintain the percent GC-content and are intended within the scope of this invention. For 
example guanines could be replaced with cytosines and vice-versa without affecting the overall 
20 efficacy of the promoter, so long as the percent GC-content is maintained. 

In another embodiment, the invention comprises a synthetic upstream element positioned 
5' to any naturally occurring or synthetic promoter for use in plants, particularly maize gene 
expression. 

From the activity of numerous promoters, basic elements (binding sites) have been 
25 defined. These include for example AT-rich regions from heat shock promoters, and ASF-1 
binding site (AS-1) eilements present in octopine synthase (OCS) and Cauliflower Mosaic Virus 
promoters. AS-1 is one of the better known upstream elements and its binding sequence (OCS 
element) is present in many constitutive plant promoters such as the CaMV35S, A. tumefaciens, 
NOS and OCS wheat histone promoters. The OCS element was first isolated as an enhancer 
30 element in the promoter of the OCS gene where it was identified as a 16-base pair palindromic 
sequence (Ellis ei al. (19S1)EMB0J. 6:1 1-16), but has been reduced to its essential features as 
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a TGACG motif. See U.S. Patent No. 4,990,607 incorporated herein by reference. The upstream 
element of the invention has a 71% identity to the promoter enhancer element disclosed in U.S. 
Patent 5,023,179 to Lam et al. The two sequences are quite different in their flanking sequences 
surrounding the TGACG motif, which regions have been shown to impact the level of 
5 transcription enhancement. The transcriptional enhancing activity of the OCS element correlates 
with the in-vitro binding of a transcriptional factor. Similar elements were also identified in the 
promoter regions of six other cDNA genes involved in opine synthesis and three plant viral 
promoters including the CaMV 35S promoter (Bouchez et al. 1 989) supra. These elements were 
shown to bind the. OCS transcription factor in-vitro and enhance transcription in plant cells. 

10 in tobacco a DN A binding factor, TGA 1 , was shown to interact specifically with the AS- 1 

element either alone or in conjunction with other promoter elements. (Katagiri et al. 1989, 
Nature 340:727-730). This factor was also shown to be expressed in a root-preferred manner in 
tobacco plants. Core promoters with one or two copies of the OCS upstream element tend to 
potentiate gene expression whereas 4 or more repeats of this element produce more or less 

15 constitutive activity albeit low relative to intact 35S promoters. 

Thus the invention incorporates a synthetic upstream element which can be used with the 
core promoter of the invention or other core promoters to enhance gene expression. The element 
incorporates three OCS-like motifs and novel intervening sequences which enhance gene 
expression. 

20 Figure 3, SEQ ID NO: 2 shows the complete sequence of one embodiment (RSyn7) of 

the synthetic upstream element which incorporates at least three TGACG SEQ ID N0:5 OSC- 
like motifs which are indicated in bold. 

Sequences flanking many elements such as the TGACG SEQ ID N0:5 motif have been 
shown to have profound impacts on binding affinities of DNA binding factors and thus play as 

25 an important role as the central motifs themselves. (Burrows et al. 1992, Plant Molecular 
Biology 19:665-675, Shinderetal. 1992, Plant Cell 4:1309-1319, Foster et al. 1994, FASEBJ 
8: 192-200). Tlie novel sequences flanking the TGACG motifs in the Rsyn7 promoter have been 
determined and established clear enhancement of transcriptional activity with various promoters, 
particularly when used with the Syn II Core promoter. 
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Rsyn7 upstream element has been cloned upstream of the Syn II Core promoter driving 
a GUS construct and has yielded levels of GUS activity in transgenic maize plants approximately 
ten-fold higher than the ubiquitin promoter, the strongest maize promoter to date. 

In yet another aspect of the present invention, at least one upstream activating region 
(UAR), which is different from the synthetic upstream element, is operably linked to the synthetic 
core promoter. The UAR may be used alone or in combination with the synthetic upstream 
element described herein. Preferably, the upstream activating regions of the cauliflower mosaic 
virus (CaMV) 35S promoter and the maize Ubi-l gene promoter are utilized. Additionally, 
sequences having sequence similarity to these U ARs may be utilized as long as such sequences 
retain the ability to enhance promoter activity. Enhancement can be measured by assaying for 
levels of transcripts or alternatively protein production. 

CaMV 35S UARs have been well studied in the art. The complete nucleotide sequence 
of the CaMV circular double-strand DNA has been established in the art. See Guilley et al 
(1980) Cell 21 :285-294. The 35S promoter transcribes the major 35S RNA transcript from the 
circular viral genome by nucleus RNA polymerase 11. See Guilley et al (1982) Cell 30:763-773. 
Moreover, the 35S UARs can function with a heterologous promoter and increase expression 
of a gene of interest in cells and transgenic plants. Shah et al (1986) Science 233:478-481. 
Multiple CIS regulatory elements for the activity of the CaMV 35S promoter have been identified. 
See Odell et al (1985) Nature 313:810-812; Fang et al (1989) Plant Cell 1:141-150. 

In the present invention, a large fragment of the upstream activating regions (UARs) of 
the CaMV 35S promoter can be utilized to enhance the activity of the core synthetic promoter. 
The size of the UAR can, for example, range from about 15 base pairs to about 850 base pairs, 
preferably from about 20 to about 500, more preferably from about 20-25 to about 50-200 base 
pairs. A preferred region of the 35S CaMV upstream region includes sequences from about -421 
to about -90. It is recognized that modifications, in length and nucleotide sequence can be made 
to the region and still result in enhanced activity of the core synthetic promoter. Such 
modifications can be tested for effect on activity by using expression systems as set forth in the 
Experimental Section of the present application. The numbers on the UAR sequence diagram 
indicate the position upstream from the transcription start site, or +1 position of the 35S strucnnal 
gene. For example, -25 means a position 25 base pairs upstream from the transcription start site 
of the 35S structural gene. 



wo 99/43838 



PCTAJS99/03863 



-17- 

The upstream activating region of the maize ubiquitin gene Ubi-1 can also be utilized m 
the invention. The sequence of the Ubi-1 gene transcription regulatory region is disclosed in US 
Patent No. 5,510,474, See also Christensen et al (1992) Plant Mol Biol 75:675-689; Comejo 
etal (1993) Plant Mol Biol 2i:567-581; Takimoto etal (1994) Plant Mol Biol 26:1007-1012; 
5 and Christensen et al (1996) Transgenic Res. 5:213-218. The UAR of the Ubi-I gene promoter 
comprises preferably from about -867 to about -54. As indicated above for the 35S UAR, 
modifications of the Ubi-1 UAR that still function to enhance the activity of the core promoter 
are encompassed. 

While the full sequence of the ubiquitin promoter has been published, this is the first 

10 disclosure of the Ubi UAR. Thus, the invention discloses the UAR of the ubiquitin promoter as 
well as the Ubi UAR in combination with any promoter. Additionally, methods for using the Ubi 
UAR to enhance activity of promoters are encompassed. 

The upstream activating regions as described herein can be linked with the synthetic core 
promoter and/or other upstream elements by any conventional method that is generally known 

15 in the art as long as an operative element or promoter is constructed. The upstream activating 
regions are generally operably linked to the 5' end of the core promoter. When the synthetic 
upstream element is also present, the upstream activating regions can be linked to either the 5' 
end of the synthetic upstream element, the 3' end of the core synthetic promoter or inserted 
between the synthetic upstream element and the synthetic core promoter. In a preferred 

20 embodiment the upstream activating regions are linked in close proximity to the synthetic 
upstream element, if present, and the synthetic core promoter. By close proximity is intended 
within from about 1 to about 50 nucleotides. However, it is recognized that more than 50 
nucleotides may separate the elements. The upstream activating regions can be in the 5' to 3' 
direction or the 3' to 5' direction, but preferably in the 5' to 3' direction at the 5' end of the 

25 synthetic core promoter or the synthetic upstream element. 

One or multiple copies of the upstream activating regions can be used. When multiple 
copies are utilized, they can be tandem repeats of one UAR or combinations of several UARs. 
In this manner, the level of expression of a nucleotide sequence of interest can be controlled by 
the number of UARs present in the promoter construction since the results indicate that increased 

30 expression levels are obtained with increased numbers of UARs. Thus, the invention provides 
methods for regulating levels of expression of a gene or nucleotide sequence of interest. 
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As indicated, multiple copies of a UAR can be used to enhance the activity of the 
operably linked promoter. As noted, multiple copies of the same or different UARs can be 
utilized. For example, any combination of CaMV 35S UARs and maize Ubi-1 gene UARs can 
be utilized. 

5 The promoters of this invention having one or more UARs as described above can be 

provided in expression cassettes and such cassetts contained in plasmid or viral vectors. Such 
vectors can be used for transformation of bacteria and plant cells. Transgenic plants can be 
ultimately regenerated from such transformed plant cells. 

The UARs incorporated into plant promoters can substantially enhance transcription 

10 activity in transgenic plants. For example, one or more copies of the upstream activating region 
of the maize Ubi-I gene can be operably linked to a promoter having the core synthetic promoter 
sequence and the synthetic upstream element of the invention. The promoter constructs of the 
invention can be operably linked to any nucleotide sequence or gene of interest. The promoter 
construct, for example, can be used to enhance oxalate oxidase gene expression in transgenic 

15 plants. Oxalate oxidase is a plant enzyme implicated in plant defense mechanisms against 
pathogens attack. The enzyme degrades the chemical compound oxalic acid secreted by plant 
pathogens. See e.g., PCT Pubhcation No. WO 92/14824. Increasmg the oxalate oxidase level 
in plants such as sunflower will lead to increased plant resistance to plant pathogens. 

The following examples are for illustration purposes only and are intended in no way to 

20 limit the scope or application of the present invention. Those of skill in the art will appreciate 
that many permutations can be achieved and are in fact intended to be within the scope of the 
invention. All reference citations throughout the specification are expressly hereby incorporated 
by reference. 

25 EXAMPLE 1 

Plasmids were designed using the multiple cloning site of pBlueScriptiIKS+ from 
Stratagene. (To facilitate cloning of the different combination of elements). Oligonucleotides 
containing the sequences of the elements were synthesized with restriction endonuclease sites at 
tiae ends. Thus elements could be added or removed and replaced as needed. GUS and 

30 Luciferase were used for reporter genes. 
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For transient assaySj plasmid DNA was introduced into intact 3-day-oId maize seedlings 
by particle bombardment. Following 16 hours incubation at 25EC in the dark, expression was 
assayed by measuring GUS enzyme activity in root and shoot extracts from each seedling to 
determine if any tissue-preferred expression was demonstrated. GUS activity was measured 
5 using a GUS-Light assay kit from Tropix (47 Wiggins Avenue, Bedford, MA 01 730). 

Constructs that gave high levels of expression were introduced into a cell line to produce 
stable transforraants. These stable transformants (TO) were assayed by PGR to determine the 
presence of the GUS gene by MUG (4-methylumelliferyl-glucuronide) assay to quantify the 
activity level of the GUS protein being produced. When the plants were ready to be transferred 
10 to the greenhouse they were assayed histochemically with X-gluc to determine where the GUS 
product was being synthesized. Plants demonstrating preferred expression levels were grown in 
the greenhouse to V6 stage. 

EXAMPLE 2 

Construction of Plasmids Containing the Syn II Core Promoter. 

15 Standard molecular biological techniques were carried out according to Maniantis et al. 

(1982) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring 
Harbor, N. Y. All plasmids utilized in the invention can be prepared according to the directions 
of the specification by a person of ordinary skill in the art without undue experimentation 
employing materials readily available in the art. 

20 Oligos N306 SEQ ID N0:6 5'-TCGACACTGC AGCTCTAGGG ATGGTAGCGC 

AGGGTGCGTA GGTACGTATT TATAGCCGCT CGAGTG-3' and N307 SEQ ID NO: 7 5'- 
GATCCACTCG AGCGGCTATA AATACGTACC TACGCACCCT GCGCTACCAT 
CCTAGAGCT GCAGTG-3' were synthesized according to directions on an automated DNA 
synthesizer (such as Applied Biosystems Inc. DNA Synthesizer (Model 380B). These automated 

25 synthesizers are commercially available. The oligos were then ligated to the BamHl fragment 
of the.pBlueScriptIIKS+ plasmid comprising of the p-glucuronidase gene interrupted by the 
maize ADHl intron 1 region. A map of a plasmid incorporating both the Syn II Core promoter 
and the upstream element is disclosed as Figure 4. Several other embodiments are shown in 
other plasmids depicted in Figure 5. Plasmid numbers are shovm to the right of each promoter 

30 diagram with the corresponding legend placed below the diagrams. The top diagram shows the 
complete transplant transcriptional unit with the subsequent diagrams focusing on the salient 
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differences between 35S and Syn II Core promoters. The legend shows the number and nature 
of the various promoter subelements, the sequence if relatively short, the source of the element 
and position relative to the start of transcription. 

The sequence of the core promoter consists of 35 base pairs with enzyme sites upstream 
5 of a TATA box and a start of transcription with 10 to 15 base pairs downstream. Upstream 
elements (Gal 4-binding sites, Rsyn, AT-GBL etc.) were fused to the core sequence with ADHI- 
intron and different marker genes (LUC or GUS) and were demonstrated functional both in 
transient assays (Figure 6) and Rsyn stably transformed plants (Figure 7). 

EXAMPLE 3 

10 Construction of Upstream Element Rsyn? Fused to Syn II Core Promoter Resulting in 

Plasmids FHP5903 and PHP6086. 

Oligos for constructing the Rsyn? promoter subelement Nl 965: (SEQ ID NO: 8) 
GATCCTATGA CGTATGGTAT GACGTGTGTT CAAGATGATG ACTTCAAACC 
TACCTATGAC GTATGGTATG ACGTGTGTCG ACTGATGACT TA-3' and N1966; (SEQ 

1 5 ID NO:9) GATCTAAGTC ATCAGTCGAC ACACGTCATA CCATACGTC A TAGGTAGGTT 
TGAAGTCATC ATCTTGAACA CACGTCATAC CATACGTCA TAG-3' were synthesized 
as earher described. The oligos were annealed and cloned into a PHP3398 plasmid upstream of 
the Syn II Core sequence and resulted in several versions of the original Rsyn? sequence due to 
spontaneous deletions. The Rsyn?-2 version involved a single base deletion resulting in a 3X 

20 reiterative TGACG motif upstream of the Syn II Core promoter (Rsyn?: :LUC, P5903). The LUC 
coding sequence was replaced by GUS coding sequence to produce the Rsyn?:: GUS construct 
P6086. P6086 was later introduced into transgenic maize resulting in high levels of constitutive 
activity in four of the six active events examined (Figure ?). 

The progeny from TO plants from several transfonnation events were examined and GUS 

25 activity ranging from 1 to 400 PPM (micrograms GUS enzyme/GFW in root tissue of a ?-day old 
seedlings) Figure 8. These TO and Tl plants generally produced 4X-10X greater GUS activity 
than plants harboring the ubiquitin::GUS reporter gene. 

Thus from the foregoing, it can be seen that the invention accomplishes at least all of its 
objectives. 
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EXAMPLE4 

Transformation and Expression with Syn II Core Promoter and/or Rsyn7 Upstream 
Element. 

Using transient bombardment assays the Syn II Core promoter sequence was compared 
against the 35S core sequence either alone or in conjunction with numerous activation elements. 
Figure 6 is a depiction of transient assay data using the plasmids incorporating the promoter 
sequences of the invention and shows transient GUS or LUC activity in three-day old maize roots 
or BMS callus bombarded with chimeric promoter::GUS or LUC constructs. The -33 CaMV35S 
in the Syn 11 Core promoter versions of the synthetic promoter: :GUS (or LUC) constructs were 
bombarcied into three-day old roots (or cultured BMS calli as described hereinafter) and assayed 
for enzyme activity 20 hours after bombardments. The data shown are the raw enzyme units of 
a compilation of at least three experiments and have not been normalized in any fashion due to 
the inherent variability of the transient assays. Control plasmids 1654 and 3537 are the LUC 
constructs tested in maize BMS calli. There is approximately 4 to 20 fold difference in transient 
activity between the 35S and Syn II Core versions. The Y axis is in log scale. Both core 
promoters were driving a GUS containing construct (Figures 4 and 5) and generated a basal level 
of activity (Figure 6). However when activator elements were placed upstream of the TATA 
motif, the Syn II Core provided generally higher levels of activity (2-4 fold better) in com cells 
than when the activator elements were placed upstream of the 35S core (Figure 6). 

The Syn II Core sequence has been shown to enhance activity in stably transformed 
plants. Further with certain activator sequences upstream of the TATA element activity levels 
in stably transformed com plants reached levels ten-fold greater than maize ubiquitin constructs 
which produces extremely high levels of activity. 

Figures 7 and 8 show GUS activity levels from isolated tissues of VT stage TO plants and 
root tissue from Tl seedlings, respectively. These data demonstrate that this core sequence can 
participate in potentiating very high levels of activity as a fimctional partner for the active 
chimeric promoters. Figure 7 shows the Rsyn7::GUS (6086) activity in TO maize plants. VT 
stage plants vnth ears post pollinated 3 to 8 da^s were dissected and assayed for GUS activity. 
7A depicts GUS expression in designated tissues, 7B depicts a schematic of a com plant with 
sites of measurement indicated. Plants from TO events that demonstrated a range of activities 
with the Rsyn7 promoter were assayed. Log scale again noted. The activity range for UBI::GUS 
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plants is indicated at the right of graph for comparisons. These data demonstrate that the Rsyn7 
promoter can increase activity to ten-fold above levels of the ubiquitin promoter yet shows little 
tissue preference maMng the Rsyn7 suitable as a strong constitutive promoter. 

Figure 8 depicts GUS activity in root segments of a segregating population of maize Tl 
5 transgenic seedlings containing the Rsyn7::GUS (6086) orthe UBI::GUS (3953) construct. 1 cm 
root segments from six to seven-day old transgenic maize seedlings were dissected, weighed and 
assayed for GUS using GUS-light kit. Activity is represented as parts per million of fresh weight. 
The root activity of several Tl plants harboring the Rsyn7::GUS promoter shows higher activity 
than much of the activity levels produced by the UBI promoter. This is consistent with data from 

10 TO transgenic plant. Activity levels in Rsyn7::GUS containing young leaves are also much 
higher than the activity levels of UBI::GUS-containing young leaves (data not shown). The Syn 
II Core sequence was shown to function well with a variety of upstream elements including GAL 
4 binding sites, Rsyn7 elements, GBL elements, etc. 

Figure 9 shows GUS expression of three synthetic promoters in TO transgenic maize 

15 plants. Dissected tissues (See Figure 7B) from VT stage transgenic TO plants harboring Rsyn7 
(Rsyn), Atsyn or the Syn II Core alone (syn-core) promoter:: GUS constructs were quantitatively 
assayed for GUS activity. Each circle represents an average of tissue activity of transgenic maize 
plants from a single transformation event. The TGACG-motif corresponds to the Rsyn7 
sequence and the "AT-com" motif refers to the consensus AT-Iike composite element, Atcom, 
20 from W. Gurley, et al. 1993. In\ Control of Plant Gene Expression, ed. by Desh Pal Verma. 
CRC press, Boca Raton, PL. pp. 103-123. Syn-core refers to the Syn II Core promoter sequence 
containing the TATA element and the start of transcription. 



25 EXAMPLE 5 

Construction of Rsyn7 promoter having the upstream activating regions from the CaMV 
35S gene and the maize Ubi-l gene. 

To construct an expression vector having 35SU (upstream activating regions from 35S 
gene)-Rsyn7 promoter, PHP413 was digested with Bglll and EcoRV. The staggered/sticky ends 
30 of the linearized vector were filled in by Klenow in the presence of dNTP. The 2x CaMV 
fragment was blunt end ligated into BamHl digested PHP6086 after filling the BamHl ends. 
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The CaMV 35S-Rsyn7 fragment was then cut out from the new construct by digestion with Xbal 
and Pst 1, and hgated into the 4 kb Xbal-Pstl vector from PHP9925 to form the expression 
vector of 35SU-Rsyn7::GUS (PHP9778). 

To construct an expression vector having UbiU (upstream activating elements from the 
5 maize Ubi-1 gene)-Rsyn7 promoter, the Xbal -Spe 1 fragment from PHP8277 was ligated into the 
Xbal site of PHP6086 to form PHP10539, into the Xbal site of PHP10970 to form PHP10971, 
and into the Xbal site of PHP 10971 to form the expression vector of Ubi-1 -Rsyn7::GUS 
(PHP 10972). 

Those sequences not referenced otherwise include: 
10 SEQ ID NO: 1 1 sets forth the 35S UAR. 

SEQ ID NO: 12 sets forth the SCPl promoter sequence, (35S UAR operably linked to 
core promoter of SEQ ID NO: 1). 

SEQ ID NO: 13 sets forth the Ubil UAR. 

SEQ ID NO: 14 sets forth SCPl operably linked to the oxalate oxidase coding sequence 
15 operably linked with the Pinll terminator. 

SEQ ID NO: 16 sets forth the UCP2 promoter sequence (2 copies of Ubil UAR operably 
with the core promoter). 

SEQ ID NO: 18 set forth the UCP4 promoter sequence (4 copies of Ubil UAR operably 
with the core promoter). 

20 

EXAMPLE 6 

Transformation and Expression of promoter constructs. 

The various promoters: :GUS fragments were cloned into a Bin9 binary vector that 
contains ALS3::NPTII as selection marker for generating transgenic sunflower callus or 
25 Arabidopsis. 

For transient expression, SMF3 sunflower seeds were planted in greenhouse. 1 5-day-old 
seeds after pollination were collected from the plants and used in the transient expression system. 
After removing the pericarp, the cotyledons with seed coats were sterilized by incubation in 20% 
bleach at RT for 15 minutes, and washed four times with sterile double distilled water. The 
30 cotyledons were then incubated on 3MM filter wetted with MS medium overnight before they 
were bombarded according to the method disclosed by Klein et al (1989) Proc, Natl Acad, Set 
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USA 56:6681-6685, hereby incorporated by reference thereto. GUS activity was analyzed 20 
hours after bombardment using the GUS-Light assay kit from Tropix according to the 
manufacturer's protocol. 

For leaf disc transformation, young expanded SMF3 sunflower leaf from 30-day-oId 
5 sunflower was harvested and sterilized in 20% bleach with a couple of drops of Tween 20 for 20 
minutes. Leaf discs were prepared from the sterile leaves after washing them wdth sterile double 
distilled water 4 times. The leaf discs were then incubated for 10 minutes in inoculation medium 
(12.5 mM MES, 1 g/1 NH4CI, and C.3 g/1 MgS04) containing Agrobacterium (EHA105) 
transformed with the vector constructs to be tested at A600=0.75. The leaf discs were then 
10 grown for 3 days in non-selection medium and were then transferred to selection medium. 

To transform Arabidopsis, Arabidopsis were grown in greenhouse to the stage when bolts 
start to emerge at 15 plants/pot. The emerging bolts were clipped off to encourage the growth 
of multiple secondary bolts. After 7 days, the plants were ready for infiltration. Agrobacterium 
(EHA105) carrying the construct to be tested was cultured at 28 "C to when A600 was between 
15 0.65 and 0.8. The cells were harvested in inoculation medium (4.3g/l of MS salt, 0.5 mg/1 of 
nicotinic acid, 0.5 mg/1 of pyridoxine-HCl, 1 mg/l of Thiamine-HCl, 0.1 g of myo-inositol, lg/1 
of casamino acids, 0.01 mg of BAP, 68.5 g/I of sucrose, and 36 g/1 of glucose) at A600 of 7.5. 

The clipped plants to be transformed were inverted into a 250 ml beaker containing the 
above Agrobacterium solution. The beaker was placed into a bell jar and was vacuumed until 
20 bubbles formed on leaf and stem surface. After 15 minutes of infiltration, the vacuum was 
released and the plants were removed from the beaker, laid on its side in a plastic flat, and 
covered with plastic wrap. The plants were set upright and grown in greenhouse for four weeks 
before seeds were harvested. Transgenic seeds were selected by planting the seeds on a medium 
plate containing 65 ng/ml of kanamycin. 
25 In traiisient expression assays, the Rsyn7 promoter in PHP 10464 has 15% of the 35S 

promoter (PHP9925) activity, whereas the 35SU-R5yn7 promoter (PHP9778) (hereinafter SCPl 
promoter) has about 107% of the 35S promoter activity (Figure 10). Thus, the upstream 
activating region of the CaMV 35 S gene increased the Rsyn7 activity by about 6 fold. 

The maize Ubi-I upstream element (UbiU) has similar effects on the Rsyn7 promoter in 
30 transient assays. When the upstream activating region (UAR) of the maize Ubi-l was fiised to 
Rsyn7, the GUS enzyme activity increased with the number of copy of the UAR. In the presence 
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of three copies of the UbiU, GUS activity increased by about 4-fold. This additive efifect of UbiU 
was not observed when placed in the context of the maize ubiquitin promoter (PHP 1 1974). This 
suggests that replacement of the maize Ubil core promoter with Rsyn7 may convert the monocot 
Ubil promoter into a highly active promoter in dicot plants (Figure 1 1). 

In the stably transformed sunflower callus, GUS expression is 20% higher behind the 
control of the 35SU-Rsyn7 (SCPl promoter) promoter than when behind the control of the 35 S 
CaMV promoter(Figure 12). 

The results of the transgenic callus assay are given in Figure 13. The Rsyn7 promoter 
containing a single copy of UbiU (PHP 10991) (hereinafter UCPl promoter) (SEQ ID NO: 15) 
exhibited promoter activity of about 3 times that of the 35SU-Rsyn7 (SCPI) promoter 
(PHP10940). Three copies of UbiU (PHP10993) increased Rsyn7 promoter activity to about 7 
times that of the 35SU-Rsyn7 promoter. The 3xUbiU-Rsyn7 (UCP3 promoter) (SEQ ID NO: 
17) is by far the strongest promoter in sunflower tissues. 

To detemiine the activity and tissue-specificity of the enhanced Rsyn7 promoters, stably- 
transformed sunflower and Arabidopsis were generated through Agrobacterium-mediated 
transformation. The histochemical staining of GUS expression in transgenic Tl Arabidopsis 
indicates that 35SU-Rsyn7 (SCPl promoter) (PHP10940) has identical tissue-specificity and 
similar activity as 35S CaMV promoter (PHP 10989). Both promoters express GUS in leaf, stem, 
petiole, and floral parts. UbiU-Rsyn7 (USCPl promoter) (PHP10991) exhibits higher activity 
than maize Ubi-l promoter (PHPl 103 1) in Arabidopsis stem and leaf tissues. 

All publications and patent applications mentioned in the specification are indicative of 
the level of those skilled in the art to which this invention pertains. All publications and patent 
applications are herein incorporated by reference to the same extent as if each individual 
publication or patent application was specifically and individually indicated to be incorporated 
by reference. 

Although the foregoing invention has been described in some detail by way of illustration 
and example for purposes of clarity of understanding, it will be obvious that certain changes and 
modifications may be practiced within the scope of the appended claims. 
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IN THE CLAIMS 



What is claimed is: 

1 . A synthetic DNA plant promoter sequence, said sequence comprising: 
a TATA motif; 
a transcription start site; and 

a region between said TATA motif and said start site that is at least about 64% 

GC-rich 



2. The promoter of Claim 1 , wherein said promoter sequence is SEQ ID NO; 1 0. 

3. The promoter of Claim 1 , wherein said promoter sequence is SEQ ID NO: 1 . 

4. An expression cassette comprising: 

a synthetic promoter comprising a TATA motif; 

a transcription start site and a region there between that is at least about 64% GC 

a structural gene operatively linked to said promoter; and 
a transcription end site polyadenylation signal. 

5. The expression cassette of Claim 4, wherein said promoter is SEQ ID NO: 1 . 

6. The expression cassette of Claim 4, wherein said promoter is SEQ ID NO: 1 0. 



7. The expression cassette of Claim 4, further comprising an upstream element 
operatively linked to said promoter so that transcription is enhanced. 

8. The expression cassette of Claim 7, wherein said upstream element is SEQ ID 

N0:2. 
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9. A nucleic acid vector comprising the promoter of Claim 1, 2, or 3 operatively 
linked to a structural gene. 

1 0. The vector of Claim 9, wherein said vector is a cloning vector. 

1 1 . The vector of Claim 9, wherein said vector is an expression vector. 

12. The vector of Claim 9, further comprising a marker gene for selection of 
transformed cells. 

1 3. The vector of Claim 12, wherein said marker gene is an antibiotic resistance gene. 

14. The vector of Claim 9, further comprising a polyadenylation signal. 

1 5. The vector of Claim 9, wherein said vector further comprises an upstream element 
operatively linked to said promoter. 

16. The vector of Claim 15, wherein said upstream element is SEQ ID N0:2. 

17. A prokaiyotic or eukaiyotic host cell transformed with the nucleic acid vector of 
Claim 9. 

18. A transgenic plant comprising: 



a plant cell or ancestor thereof which has been transformed with the vector of 



Claim 9. 



19. 



A synthetic upstream element having a sequence of SEQ ID N0:2. 



20. 



An expression cassette comprising: 



a promoter sequence; 

a structural gene operatively linked to said promoter sequence; 
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a polyadenylation signal; and 

a synthetic upstream element homologous to SEQ ID N0:2 operatively linked to 
said promoter so that expression is enhanced. 

2 1 . The expression cassette of Claim 20, wherein said synthetic promoter sequence 
isSEQIDN0:2. 

22. A nucleic acid vector comprising the expression cassette of Claim 20. 

23. The vector of Claim 20, v^rherein said synthetic upstream element is SEQ ID 

N0:2. 

24. A prokaryotic or eukaryotic host cell transformed with the vector of Claim 22. 

25. A DNA sequence comprising a promoter construct, said construct comprising in 
operable linkage: 

a core synthetic promoter sequence comprising a TATA motif, a transcription start 
site, and a region between said TATA motif and said start site that is at least 64% GC-rich; and 
an upstream activating region operably linked to said core synthetic promoter. 

26. The DNA sequence of Claim 25 wherein said promoter construct further 
comprises a synthetic upstream element. 

27. A DNA sequence in accordance with Claim 25, wherein said upstream activating 
region comprises an upstream activating region of CaMV 3 5 S. , 

28. A DNA sequence in accordance with Claun 26, wherein said upstream activating 
region of CaMV 35S comprises from about -420 to about -90 of the CaMV 35S gene. 

29. A DNA sequence in accordance with Claim 25, wherein said upstream activating 
region comprises an upstream activating region of maize Ubi-I promoter. 
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30. A EiNA sequence in accordance witli Claim 29, wherein said upstream activating 
region of maize Ubi-l promoter comprises approximately from about -865 to about -54 of the 
maize Ubi-1 gene. 

31. The DNA sequence of Claim 25, wherein said upstream activating region 
comprises at least one upstream activating region, said upstream activating region selected from 
the group consisting of CaMV 35S UAR and Ubi-1 UAR. 

32. An expression cassette comprising: 

a core synthetic promoter sequence comprising a TATA motif, a transcription start 
site, and a region between said TATA motif and said start site that is at least 64% GC-rich; 

an upstream activating region operably linked to said synthetic promoter to 
enhance transcription: 

a structural gene operably linked to said synthetic promoter; and 

a polyadenylation signal. 

33. An expression cassette of Claim 32, wherein said upstream activating region 
comprises a DNA sequence selected from the group consisting of an upstream activating regions 
of CaMV 35S and an uostream activating regions of the maize Ubi-1 gene. 

34. A DNA vector comprising the DNA sequence of Claim 25 operably linked to a 
structural gene. 

35. The expression cassette of Claim 33, wherein said structural gene is an oxalate 
oxidase gene. 

36. A transgenic plant comprising: 

a plant cell or ancestor thereof which has been transformed with the vector of 

Claim 34. 
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37. A transgenic plant comprising: 

a plant cell or ancestor thereof which has been transformed with the vector of 

Claim 35. 

38. A method for controlling the level of expression of a transgenic nucleotide 
sequence in a plant cell said method comprising transforming with an expression cassette 
comprising a promoter having at least one Ubi-l UAR. 

39. An isolated nucleotide sequence comprising a DNA enhancer sequence 
comprising tlie nucleotide sequence set forth in SEQ ID No: 5. 

40. A nucleotide sequence comprising a promoter construct, said construct comprising 
in operable linkage: 

a core promoter sequence; and 
a Ubi'l UAR. 

41. The nucleotide sequence of claim 38, wherein said Ubi-1 UAR is a maize Ubi 

UAR. 

42. The nucleotide sequence of Claim 41, wherein said Ubi UAR comprises the 
sequence set forth in SEQ ID No: 13. 

43 . The nucleotide sequence of Claim 40 wherein said promoter construct comprises 
at least one copy of said Ubi UAR. 

44. The nucleotide sequence of Claim 40 wherein said promoter construct comprises 
at least two copies of said Ubi UAR. 

45 . The nucleotide sequence of Claim 40 wherein said promoter construct comprises 
at least three copies of said Ubi UAR. 
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46. An expression cassette comprising in operable linkage: 
a core promoter sequence; 

a Ubi UAR operably linked to said core promoter to form a synthetic promoter 

construct; 

a nucleotide sequence of interest operably linked to said synthetic promoter; and 
a polyadenylation signal. 

47. The expression cassette of Claim 46, wherein said upstream Uhi-1 UAR 
comprises the sequence set forth in SEQ ID No: 1 3. 

48. A DNA vector comprising the expression cassette of Claim 46. 

49. The expression cassette of Claim 46 wherein said nucleotide sequence encodes 
oxalte oxidase. 

50- A transformed plant containing in its genome the expression cassette of Claim 49. 

5 1 . The plant of Claim 50 wherein said plant is sunflower. 

52. Seed of the plant of Claim 50. 

53 . Seed of the plant of Claim 5 1 , 

54. A transgenic sunflower expressing an exogenous oxalate oxidase gene at a high 
level wherein said oxalate oxidase gene is under the transcriptional control of the recombinant 
promoter of claim 1 and further having at least one operably linked upstream activating regions 
of the 35S CaMV promoter. 
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(SEQIDN0:3) ,31^ ^ 

5'...TATA(An}A(An)A.....25bp TYYTCAT(A/C)AA 3' llVj. 1 



SYNII CORE (64%) 
(SEQID N0:1) 

...GGATCCACTCGAGCGGOraAATAC^^^^ 

CAM35S (40%) 
(SEQ ID N0:4) 

...(XTCTO^GCMGnCATnCAmGGAGAGGAAACG... /^FIG. 2. 



RSYN7 (SEQ ID NO:2) 

ff GGATCCTATGCGTATGGTATGACGTGTGnCMGATGATGACTTCAMC^^^ 
GGTATGAC6TGTGTCGACTGATGACTTAGATC 3' r- 1 o 

rivj. o. 
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PHP 6086 (5.579 Kb) FIG. 4. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(i) APPLICANT: Bruce, Wesley gowen, Benjamin A. 

Sims Lynne TagUani, Laura A. 

Lu, Guihua 

(ii) TITLE OF INVENTION: SYNTHETIC PROMOTERS 
(iii) NUMBER OF SEQUENCES: 18 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: W. Murray Spruill (Alston & Bird, LLP) 

(B) STREET: 3605 Glenwood Ave. Suite 310 

(C) CITY: Raleigh 

(D) STATE: NC 

(E) COUNTRY: USA 

(F) ZIP: 27622 



(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 



(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Spruill, W. Murray 

(B) REGISTRATION NUMBER: 32,94 3 

(C) REFERENCE/DOCKET NUMBER: 5718-20 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 919 420 2202 

(B) TELEFAX: 919 881 3175 



(2) INFORMATION FOR SEQ ID N0:1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 72 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 
GGATCCACTC GAGCGGCTAT AAATACGTAC CTACGCACGC TGCGCTACCA TCCCGAGCAC 
TGCAGTGTCG AC 



1 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 96 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Synthetic nucleic acid" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
GGATCCTATG CGTATGGTAT GACGTGTGTT CAAGATGATG ACTTCAAACC TACCTATGAC 
GTATGGTATG ACGTGTGTCG ACTGATGACT TAGATC 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
TATAWAWATY YTCATMAA 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
CCTCTATATA AGCAAGTTCA TTTCATTTGG AGAGGAAACG 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



2 
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(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
TGAGC 5 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Synthetic oligonucleotide" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
TCGACACTGC AGCTCTAGGG ATGGTAGCGC AGGGTGCGTA GGTACGTATT TATAGCCGCT 60 
CGAGTG 66 
(2) INF0R^4ATI0N FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Synthetic oligonucleotide" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GATCCACTCG AGCGGCTATA AATACGTACC TACGCACCCT GCGCTACCAT CCCTAGAGCT 60 
GCAGTG 66 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 92 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Synthetic oligonucleotide" 



3 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GATCCTATGA CGTATGGTAT GACGTGTGTT CAAGATGATG ACTTCAAACC TACCTATGAC 
GTATGGTATG ACGTGTGTCG ACTGATGACT TA 
(2) INFORMATION FOR SEQ ID NO: 9: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 92 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



{ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Synthetic oligonucleotide" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



GATCTAAGTC ATCAGTCGAC ACACGTCATA CCATACGTCA TAGGTAGGTT TGAAGTCATC 
ATCTTGAACA CACGTCATAC CATACGTCAT AG 
(2} INFORMATION FOR SEQ ID NO: 10: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 72 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Synthetic nucleic acid core 
promoter with G/C transversions" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



GGATCCACTC GAGCGGCTAT AAATASSTAS STASSSASSS TSSSSTASSA TCCCGAGCAC 
TGCAGTGTCG AC 

(2) INFORMATION FOR SEQ ID NO: 11: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 332 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



4 
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ixi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CGTCAACATG GTGGAGCACG ACACTCTCGT CTACTCCAAG AATATCAAAG ATACAGTCTC 
AGAAGACCAA AGGGCTATTG AGACTTTTCA ACAAAGGGTA ATATCGGGAA ACCTCCTCGG 
ATTCCATTGC CCAGCTATCT GTCACTTCAT CAAAAGGACA GTAGAAAAGG AAGGTGGCAC 
CTACAAATGC CATCATTGCG ATAAAGGAAA GGCTATCGTT CAAGATGCCT CTGCCGACAG 
TGGTCCCAAA GATGGACCCC CACCCACGAG GAGCATCGTG GAAAAAGAAG ACGTTCCAAC 
CACGTCTTCA AAGCAAGTGG ATTGATGTGA TG 
(2) INFORMATION FOR SEQ ID NO: 12; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 99 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 



{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

CGTCAACATG GTGGAGCACG ACACTCTCGT CTACTCCAAG AATATCAAAG ATACAGTCTC 60 

AGAAGACCAA AGGGCTATTG AGACTTTTCA ACAAAGGGTA ATATCGGGAA ACCTCCTCGG 120 

ATTCCATTGC CCAGCTATCT GTCACTTCAT CAAAAGGACA GTAGAAAAGG AAGGTGGCAC 180 

CTACAAATGC CATCATTGCG ATAAAGGAAA GGCTATCGTT CAAGATGCCT CTGCCGACAG 24 0 

TGGTCCCAAA GATGGACCCC CACCCACGAG GAGCATCGTG GAAAAAGAAG ACGTTCCAAC 300 

CACGTCTTCA AAGCAAGTGG ATTGATGTGA TGATCCTATG CGTATGGTAT GACGTGTGTT 360 

CAAGATGATG ACTTCAAACC TACCTATGAC GTATGGTATG ACGTGTGTCG ACTGATGACT 420 

TAGATCCACT CGAGCGGCTA TAAATACGTA CCTACGCACC CTGCGCTACC ATCCCTAGAG 480 
CTGCATGCTT ATTTTTACA 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 813 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Zea mays 



499 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 



TCTAGAGATA ATGAGCATTG 


CATGTCTAAG 


TTATAAAAAA 


TTACCACATA 


TTTTTTTTGT 


60 


CACACTTGTT TGAAGTGCAG 


TTTATCTATC 


TTTATACATA 


TATTTAAACT 


TTACTCTACG 


120 


AATAATATAA TCTATAGTAC 


TACAATAATA 


TCAGTGTTTT 


AGAGAATCAT 


ATAAATGAAC 


180 


AGTTAGACAT GGTCTAAAGG 


ACAATTGAGT 


ATTTTGACAA 


CAGGACTCTA 


CAGTTTTATC 


240 


TTTTTAGTGT GCATGTGTTC 


TCCTTTTTTT 


TTGCAAATAG 


CTTCACCTAT 


ATAATACTTC 


300 


ATCCATTTTA TTAGTACATC 


CATTTAGGGT 


TTAGGGTTAA 


TGGTTTTTAT 


AGACTAATTT 


360 


TTTTAGTACA TCTATTTTAT 


TCTATTTTAG 


CCTCTAAATT 


AAGAAAACTA 


AAACTCTATT 


420 


TTAGTTTTTT TATTTAATAA 


TTTAGATATA 


AAATAGAATA 


AAATAAAGTG 


ACTAAAAATT 


480 


AAACAAATAC CCTTTAAGAA 


ATTAAAAAAA 


CTAAGGAAAC 


ATTTTTCTTG 


TTTCGAGTAG 


540 


ATAATGCCAG CCTGTTAAAC 


GCCGTCGACG 


AGTCTAACGG 


ACACCAACCA 


GCGAACCAGC 


600 


AGCGTCGCGT CGGGCCAAGC 


GAAGCAGACG 


GCACGGCATC 


TCTGTCGCTG 


CCTCTGGACC 


660 


CCTCTCGAGA GTTCCGCTCC 


ACCGTTGGAC 


TTGCTCCGCT 


GTCGGCATCC 


AGAAATTGCG 


720 


TGGCGGAGCG GCAGACGTGA 


GCCGGCACGG 


CAGGCGGCCT 


CCTCCTCCTC 


TCACGGCACG 


780 


GCAGCTACGG GGGATTCCTT 


TCCCACCGCT 


CCT 






813 


(2) INFORMATION FOR SEQ ID NO: 14: 











(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1600 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: other nucleic acid 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

GATCTGAGTC TAGAAATCCG TCAACATGGT GGAGCACGAC ACTCTCGTCT ACTCCAAGAA 60 

TATCAAAGAT ACAGTCTCAG AAGACCAAAG GGCTATTGAG ACTTTTCAAC AAAGGGTAAT 120 

ATCGGGAAAC CTCCTCGGAT TCCATTGCCC AGCTATCTGT CACTTCATCA AAAGGACAGT 180 

AGAAAAGGAA GGTGGCACCT ACAAATGCCA TCATTGCGAT AAAGGAAAGG CTATCGTTCA 24 0 

AGATGCCTCT GCCGACAGTG GTCCCAAAGA TGGACCCCCA CCCACGAGGA GCATCGTGGA 300 

AAAAGAAGAC GTTCCAACCA CGTCTTCAAA GCAAGTGGAT TGATGTGATG ATCCTATGCG 360 

TATGGTATGA CGTGTGTTCA AGATGATGAC TTCAAACCTA CCTATGACGT ATGGTATGAA 420 
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CGTGTGTCGA CTGATGACTT AGATCCACTC GAGCGGCTAT AAATACGTAC CTACGCACCC 4 80 

TGCGCTACCA TCCCTAGAGC TGCAGCTTAT TTTTACAACA ATTACCAACA ACAACAAACA 54 0 

ACAAACAACA TTACAATTAC TATTTACAAT TACAGTCGAC CCGGGATCCA TGGGGTACTC 600 

CAAAACCCTA GTAGCTGGCC TGTTCGCAAT GCTGTTACTA GCTCCGGCCG TCTTGGCCAC 660 

CGACCCAGAC CCTCTCCAGG ACTTCTGTGT CGCCGACCTC GACGGCAAGG CGGTCTCGGT 720 

GAACGGGCAC ACGTGCAAGC CCATGTCGGA GGCCGGCGAC GACTTCCTCT TCTCGTCCAA 780 

GTTGGCCAAG GCCGGCAACA CGTCCACCCC GAACGGCTCC GCCGTGACGG AGCTCGACGT 840 

GGCCGAGTGG CCCGGTACCA ACAAGCTGGG TGGTGTCATG AACCGCGTGG ATTTTGGTCC 900 

CGGAGGGACC AACCCACCAC ACATCCACCC GCGTGCCACC GAGATCGGCA TCGTGATGAA 960 

AGGTGAGCTT CTCGTGGGAA TCCTTGGCAG CCTCGACTCC GGGAACAAGC TCTACTCGAG 1020 

GGTGGTGCGC GCCGGAGAGA CGTTCCTCAT CCCACGGGGC CTCATGCACT TCCAGTTCAA 108 0 

CGTCGGTAAG ACCGAGGCCT CCATGGTCGT CTCCTTCAAC AGCCAGAACC CCGGCATTGT 114 0 

CTTCGTGCCC CTCACGCTCT TCGGCTCCAA CCCGCCCATC CCAACGCCGG TGCTCACCAA 1200 

GGCACTCCGG GTGGAGGCCA GGGTCGTGGA ACTTCTCAAG TCCAAGTTTG CCGCTGGGTT 1260 

TTAA.TTTCTA GGATCCTCTA GAGTCGAACC TAGACTTGTC CATCTTCTGG ATTGGCCAAC 1320 

TTAATTAATG TATGAAATAA AAGGATGCAC ACATAGTGAC ATGCTAATCA CTATAATGTG 1380 

GGCATCAAAG TTGTGTGTTA TGTGTAATTA CTAGTTATCT GAATAAAAGA GAAAGAGATC 14 40 

ATCCATATTT CTTATCCTAA ATGAATGTCA CGTGTCTTTA TAATTCTTTG ATGAACCAGA 1500 

TGCATTTCAT TAACCAAATC CATATACATA TAAATATTAA TCATATATAA TTAATATCAA 1560 

TTGGGTTAGC AAAACAAATC TAGTCTAGGT GTGTTTTGCC 1600 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 994 base pairs 

(B) TYPE: nucleic acid 
{C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
TCTAGAGATA ATGAGCATTG CATGTCTAAG TTATAAAAAA TTACCACATA TTTTTTTTGT 
CACACTTGTT TGAAGTGCAG TTTATCTATC TTTATACATA TATTTAAACT TTACTCTACG 
AATAATATAA TCTATAGTAC TACAATAATA TCAGTGTTTT AGAGAATCAT ATAAATGAAC 
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AGTTAGACAT GGTCTAAAGG ACAATTGAGT ATTTTGACAA CAGGACTCTA CAGTTTTATC 24 0 

TTTTTAGTGT GCATGTGTTC TCCTTTTTTT TTGCAAATAG CTTCACCTAT ATAATACTTC 300 

ATCCATTTTA TTAGTACATC CATTTAGGGT TTAGGGTTAA TGGTTTTTAT AGACTAATTT 360 

TTTTAGTACA TCTATTTTAT TCTATTTTAG CCTCTAAATT AAGAAAACTA AAACTCTATT 420 
TTAGTTTTTT TATTTAATAA TTTAGATATA AAATAGAATA AAATAAAGTG ACTAAAAATT 
AAACAAATAC CCTTTAAGAA ATTAAAAAAA CTAAGGAAAC ATTTTTCTTG TTTCGAGTAG 
ATAATGCCAG CCTGTTAAAC GCCG.TCGACG AGTCTAACGG ACACCAACCA GCGAACCAGC 
AGCGTCGCGT CGGGCCAAGC GAAGCAGACG GCACGGCATC TCTGTCGCTG CCTCTGGACC 

CCTCTCGAGA GTTCCGCTCC ACCGTTGGAC TTGCTCCGCT GTCGGCATCC AGAAATTGCG 720 

TGGCGGAGCG GCAGACGTGA GCCGGCACGG CAGGCGGCCT CCTCCTCCTC TCACGGCACG 780 

GCAGCTACGG GGGATTCCTT TCCCACCGCT CCTACTAGAA CTAGTGGATC CTATGCGTAT 84 0 
GGTATGACGT GTGTTCAAGA TGATGACTTC AAACCTACCT ATGACGTATG GTATGACGTG 



480 
540 
600 
660 



900 



994 



TGTCGACTGA TGACTTAGAT CCACTCGAGC GGCTATAAAT ACGTACCTAC GCACCCTGCG 960 
CTACCATCCC TAGAGCTGCA TGCTTATTTT TACA 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1807 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

{ii} MOLECULE TYPE: other Nucleic acid 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:16: 

TCTAGAGATA ATGAGCATTG CATGTCTAAG TTATAAAATUi TTACCACATA TTTTTTTTGT 60 

CACACTTGTT TGAAGTGCAG TTTATCTATC TTTATACATA TATTTAAACT TTACTCTACG 120 

AATAATATAA TCTATAGTAC TACAATAATA TCAGTGTTTT AGAGAATCAT ATAAATGAAC 180 

AGTTAGACAT GGTCTAAAGG ACAATTGAGT ATTTTGACAA CAGGACTCTA CAGTTTTATC 240 

TTTTTAGTGT GCATGTGTTC TCCTTTTTTT TTGCAAATAG CTTCACCTAT ATAATACTTC 300 

ATCCATTTTA TTAGTACATC CATTTAGGGT TTAGGGTTAA TGGTTTTTAT AGACTAATTT 360 

TTTTAGTACA TCTATTTTAT TCTATTTTAG CCTCTAAATT AAGAAAACTA AAACTCTATT 4 20 

TTAGTTTTTT TATTTAATAA TTTAGATATA AAATAGAATA AAATAAAGTG ACTAAAAATT 4 80 

AAACAAATAC CCTTTAAGAA ATTAAAAAAA CTAAGGAAAC ATTTTTCTTG TTTCGAGTAG 54 0 
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ATAATGCCAG CCTGTTAAAC GCCGTCGACG AGTCTAACGG ACACCAACCA GCGAACCAGC 600 

AGCGTCGCGT CGGGCCAAGC GAAGCAGACG GCACGGCATC TCTGTCGCTG CCTCTGGACC 660 

CCTCTCGAGA GTTCCGCTCC ACCGTTGGAC TTGCTCCGCT GTCGGCATCC AGAAATTGCG 720 

TGGCGGAGCG GCAGACGTGA GCCGGCACGG CAGGCGGCCT CCTCCTCCTC TCACGGCACG 780 

GCAGCTACGG GGGATTCCTT TCCCACCGCT CCTACTAGAG ATAATGAGCA TTGCATGTCT 84 0 

AAGTTATAAA AAATTACCAC ATATTTTTTT TGTCACACTT GTTTGAAGTG CAGTTTATCT 900 

ATCTTTATAC ATATATTTAA ACTTTACTCT ACGAATAATA TAATCTATAG TACTACAATA 960 

ATATCAGTGT TTTAGAGAAT CATATAAATG AACAGTTAGA CATGGTCTAA AGGACAATTG 1020 

AGTATTTTGA CAACAGGACT CTACAGTTTT ATCTTTTTAG TGTGCATGTG TTCTCCTTTT 1080 

TTTTTGCAAA TAGCTTCACC TATATAATAC TTCATCCATT TTATTAGTAC ATCCATTTAG 1140 

GGTTTAGGGT TAATGGTTTT TATAGACTAA TTTTTTTAGT ACATCTATTT TATTCTATTT 1200 

TAGCCTCTAA ATTAAGAAAA CTAAAACTCT ATTTTAGTTT TTTTATTTAA TAATTTAGAT 1260 

ATAAAATAGA ATAAAATAAA GTGACTAAAA ATTAAACAAA TACCCTTTAA GAAATTAAAA 1320 

AAACTAAGGA AACATTTTTC TTGTTTCGAG TAGATAATGC CAGCCTGTTA AACGCCGTCG 1380 

ACGAGTCTAA CGGACACCAA CCAGCGAACC AGCAGCGTCG CGTCGGGCCA AGCGAAGCAG 14 4 0 

ACGGCACGGC ATCTCTGTCG CTGCCTCTGG ACCCCTCTCG AGAGTTCCGC TCCACCGTTG 1500 

GACTTGCTCC GCTGTCGGCA TCCAGAAATT GCGTGGCGGA GCGGCAGACG TGAGCCGGCA 1560 

CGGCAGGCGG CCTCCTCCTC CTCTCACGGC ACGGCAGCTA CGGGGGATTC CTTTCCCACC 1620 

GCTCCTACTA GAACTAGTGG ATCCTATGCG TATGGTATGA CGTGTGTTCA AGATGATGAC 1680 

TTCAAACCTA CCTATGACGT ATGGTATGAC GTGTGTCGAC TGATGACTTA GATCCACTCG 174 0 

AGCGGCTATA AATACGTACC TACGCACCCT GCGCTACCAT CCCTAGAGCT GCATGCTTAT 1800 

TTTTACA ^g^^ 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2620 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
TCTAGAGATA ATGAGCATTG CATGTCTAAG TTATAAAAAA TTACCACATA TTTTTTTTGT 
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CACACTTGTT 


TGAAGTGCAG 


TTTATCTATC 


TTTATACATA 


TATTTAAACT 


TTACTCTACG 


120 


AATAATATAA 


TCTATAGTAC 


TACAATAATA 


TCAGTGTTTT 


AGAGAATCAT 


ATAAATGAAC 


180 


AGTTAGACAT 


GGTCTAAAGG 


ACAATTGAGT 


ATTTTGACAA 


CAGGACTCTA 


CAGTTTTATC 


240 


TTTTTAGTGT 


GCATGTGTTC 


TCCTTTTTTT 


TTGCAAATAG 


CTTCACCTAT 


ATAATACTTC 


300 


ATCCATTTTA 


TTAGTACATC 


CATTTAGGGT 


TTAGGGTTAA 


TGGTTTTTAT 


AGACTAATTT 


360 


TTTTAGTACA 


TCTATTTTAT 


TCTATTTTAG 


CCTCTAAATT 


AAGAAAACTA 


AAACTCTATT 


420 


TTAGTTTTTT 


TATTTAATAA 


TTTAGATATA 


AAATAGAATA 


AAATAAAGTG 


ACTAAAAATT 


480 


AAACAAATAC 


CCTTTAAGAA ATTAAAAAAA 


CTAAGGAAAC 


ATTTTTCTTG 


TTTCGAGTAG 


540 


ATAATGCCAG 


CCTGTTAAAC 


GCCGTCGACG 


AGTCTAACGG 


ACACCAACCA 


GCGAACCAGC 


600 


AGCGTCGCGT 


CGGGCCAAGC 


GAAGCAGACG 


GCACGGCATC 


TCTGTCGCTG 


CCTCTGGACC 


660 


CCTCTCGAGA 


GTTCCGCTCC 


ACCGTTGGAC 


TTGCTCCGCT 


GTCGGCATCC 


AGAAATTGCG 


720 


TGGCGGAGCG 


GCAGACGTGA 


GCCGGCACGG 


CAGGCGGCCT 


CCTCCTCCTC 


TCACGGCACG 


780 


GCAGCTACGG 


GGGATTCCTT 


TCCCACCGCT 


CCTACTAGAG 


ATAATGAGCA 


TTGCATGTCT 


840 


AAGTTATAAA 


AAATTACCAC 


ATATTTTTTT 


TGTCACACTT 


GTTTGAAGTG 


CAGTTTATCT 


900 


ATCTTTATAC 


ATATATTTAA 


ACTTTACTCT 


ACGAATAATA 


TAATCTATAG 


TACTACAATA 


960 


ATATCAGTGT 


TTTAGAGAAT 


CATATAAATG 


AACAGTTAGA 


CATGGTCTAA 


AGGACAATTG 


1020 


AGTATTTTGA 


CAACAGGACT 


CTACAGTTTT 


ATCTTTTTAG 


TGTGCATGTG 


TTCTCCTTTT 


1080 


TTTTTGCAAA 


TAGCTTCACC 


TATATAATAC 


TTCATCCATT 


TTATTAGTAC 


ATCCATTTAG 


1140 


GGTTTAGGGT 


TAATGGTTTT 


TATAGACTAA 


TTTTTTTAGT 


ACATCTATTT 


TATTCTATTT 


1200 


TAGCCTCTAA 


ATTAAGAAAA 


CTAAAACTCT 


ATTTTAGTTT 


TTTTATTTAA 


TAATTTAGAT 


1260 


ATAAAATAGA 


ATAAAATAAA 


GTGACTA7VAA 


ATTAAACAAA 


TACCCTTTAA 


GAAATTAAAA 


1320 


AAACTAAGGA 


AACATTTTTC 


TTGTTTCGAG 


TAGATAATGC 


CAGCCTGTTA 


AACGCCGTCG 


1380 


ACGAGTCTAA 


CGGACACCAA 


CCAGCGAACC 


AGCAGCGTCG 


CGTCGGGCCA 


AGCGAAGCAG 


1440 


ACGGCACGGC 


ATCTCTGTCG 


CTGCCTCTGG 


ACCCCTCTCG 


AGAGTTCCGC 


TCCACCGTTG 


1500 


GACTTGCTCC 


GCTGTCGGCA 


TCCAGAAATT 


GCGTGGCGGA 


GCGGCAGACG 


TGAGCCGGCA 


1560 


CGGCAGGCGG 


CCTCCTCCTC 


CTCTCACGGC 


ACGGCAGCTA 


CGGGGGATTC 


CTTTCCCACC 


1620 


GCTCCTACTA 


GAGATAATGA 


GCATTGCATG 


TCTAAGTTAT 


AAAAAATTAC 


CACATATTTT 


1680 


TTTTGTCACA 


CTTGTTTGAA 


GTGCAGTTTA 


TCTATCTTTA 


TACATATATT 


TAAACTTTAC 


1740 


TCTACGAATA ATATAATCTA 


TAGTACTACA 


ATAATATCAG 


TGTTTTAGAG 


AATCATATAA 


1800 


ATGAACAGTT 


AGACATGGTC 


TAAAGGACAA 


TTGAGTATTT 


TGACAACAGG 


ACTCTACAGT 


1860 
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TTTATCTTTT TAGTGTGCAT GTGTTCTCCT TTTTTTTTGC 

TACTTCATCC ATTTTATTAG TACATCCATT TAGGGTTTAG 

TAATTTTTTT AGTACATCTA TTTTATTCTA TTTTAGCCTC 

TCTATTTTAG TTTTTTTATT TAATAATTTA GATATAAAAT 

AAAATTAAAC AAATACCCTT TAAGAAATTA AAAAAACTAA 

GAGTAGATAA TGCCAGCCTG TTAAACGCCG TCGACGAGTC 

ACCAGCAGCG TCGCGTCGGG CCAAGCGAAG CAGACGGCAC 

TGGACCCCTC TCGAGAGTTC CGCTCCACCG TTGGACTTGC 

ATTGCGTGGC GGAGCGGCAG ACGTGAGCCG GCACGGCAGG 

GGCACGGCAG CTACGGGGGA TTCCTTTCCC ACCGCTCCTA 

GCGTATGGTA TGACGTGTGT TCAAGATGAT GACTTCAAAC 

GACGTGTGTC GACTGATGAC TTAGATCCAC TCGAGCGGCT 

CCTGCGCTAC CATCCCTAGA GCTGCATGCT TATTTTTACA 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3433 base pairs 
{B} TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



AAATAGCTTC 

GGTTAATGGT 

TAAATTAAGA 

AGAATAAAAT 

GGAAACATTT 

TAACGGACAC 

GGCATCTCTG 

TCCGCTGTCG 

CGGCCTCCTC 

CTAGAACTAG 

CTACCTATGA 

ATAAATACGT 



ACCTATATAA 

TTTTATAGAC 

AAACTAAAAC 

AAAGTGACTA 

TTCTTGTTTC 

CAACCAGCGA 

TCGCTGCCTC 

GCATCCAGAA 

CTCCTCTCAC 

TGGATCCTAT 

CGTATGGTAT 

ACCTACGCAC 



1920 

1980 

2040 

210C 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2620 



(ii) MOLECULE TYPE: other nucleic acid 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

TCTAGAGATA ATGAGCATTG CATGTCTAAG TTATAAAAAA TTACCACATA TTTTTTTTGT 60 

CACACTTGTT TGAAGTGCAG TTTATCTATC TTTATACATA TATTTAAACT TTACTCTACG 120 

AATAATATAA TCTATAGTAC TACAATAATA TCAGTGTTTT AGAGAATCAT ATAAATGAAC 180 

AGTTAGACAT GGTCTAAAGG ACAATTGAGT ATTTTGACAA CAGGACTCTA CAGTTTTATC 24 0 

TTTTTAGTGT GCATGTGTTC TCCTTTTTTT TTGCAAATAG CTTCACCTAT ATAATACTTC 300 

ATCCATTTTA TTAGTACATC CATTTAGGGT TTAGGGTTAA TGGTTTTTAT AGACTAATTT 360 

TTTTAGTACA TCTATTTTAT TCTATTTTAG CCTCTAAATT AAGAAAACTA AAACTCTATT 4 20 

TTAGTTTTTT TATTTAATAA TTTAGATATA AAATAGAATA AAATAAAGTG ACTAAAAATT 4 80 

AAACAAATAC CCTTTAAGAA ATTAAAAAAA CTAAGGAAAC ATTTTTCTTG TTTCGAGTAG 54 0 

ATAATGCCAG CCTGTTAAAC GCCGTCGACG AGTCTAACGG ACACCAACCA GCGAACCAGC 600 
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AGCGTCGCGT CGGGCCAAGC GAAGCAGACG GCACGGCATC TCTGTCGCTG CCTCTGGACC 660 
CCTCTCGAGA GTTCCGCTCC ACCGTTGGAC TTGCTCCGCT GTCGGCATCC AGAAATTGCG 720 
TGGCGGAGCG GCAGACGTGA GCCGGCACGG CAGGCGGCCT CCTCCTCCTC TCACGGCACG 780 
GCAGCTACGG GGGATTCCTT TCCCACCGCT CCTACTAGAG ATAATGAGCA TTGCATGTCT 84 0 

AAGTTATAAA AAATTACCAC ATATTTTTTT TGTCACACTT GTTTGAAGTG CAGTTTATCT 900 
ATCTTTATAC ATATATTTAA ACTTTACTCT ACGAATAATA TAATCTATAG TACTACAATA 960 
ATATCAGTGT TTTAGAGAAT CATATAAATG AACAGTTAGA CATGGTCTAA AGGACAATTG 1020 
AGTATTTTGA CAACAGGACT CTACAGTTTT ATCTTTTTAG TGTGCATGTG TTCTCCTTTT 1080 
TTTTTGCAAA TAGCTTCACC TATATAATAC TTCATCCATT TTATTAGTAC ATCCATTTAG 1140 

GGTTTAGGGT TAATGGTTTT TATAGACTAA TTTTTTTAGT ACATCTATTT TATTCTATTT 1200 

TAGCCTCTAA ATTAAGAAAA CTAAAACTCT ATTTTAGTTT TTTTATTTAA TAATTTAGAT 1260 

ATAAAATAGA ATAAAATAAA GTGACTAAAA ATTAAACAAA TACCCTTTAA GAAATTAAAA 1320 

AAACTAAGGA AACATTTTTC TTGTTTCGAG TAGATAATGC CAGCCTGTTA AACGCCGTCG 1380 

ACGAGTCTAA CGGACACCAA CCAGCGAACC AGCAGCGTCG CGTCGGGCCA AGCGAAGCAG 14 4 0 

ACGGCACGGC ATCTCTGTCG CTGCCTCTGG ACCCCTCTCG AGAGTTCCGC TCCACCGTTG 1500 

GACTTGCTCC GCTGTCGGCA TCCAGAAATT GCGTGGCGGA GCGGCAGACG TGAGCCGGCA 15 60 

CGGCAGGCGG CCTCCTCCTC CTCTCACGGC ACGGCAGCTA CGGGGGATTC CTTTCCCACC 1620 

GCTCCTACTA GAGATAATGA GCATTGCATG TCTAAGTTAT AAAAAATTAC CACATATTTT 1680 

TTTTGTCACA CTTGTTTGAA GTGCAGTTTA TCTATCTTTA TACATATATT TAAACTTTAC 17 40 

TCTACGAATA ATATAATCTA TAGTACTACA ATAATATCAG TGTTTTAGAG AATCATATAA 1800 

ATGAACAGTT AGACATGGTC TAAAGGACAA TTGAGTATTT TGACAACAGG ACTCTACAGT 1860 

TTTATCTTTT TAGTGTGCAT GTGTTCTCCT TTTTTTTTGC AAATAGCTTC ACCTATATAA 1920 

TACTTCATCC ATTTTATTAG TACATCCATT TAGGGTTTAG GGTTAATGGT TTTTATAGAC 1980 

TAATTTTTTT AGTACATCTA TTTTATTCTA TTTTAGCCTC TAAATTAAGA AAACTAAAAC 2040 

TCTATTTTAG TTTTTTTATT TAATAATTTA GATATAAAAT AGAATAAAAT AAAGTGACTA 2100 

AAAATTAAAC AAATACCCTT TAAGAAATTA AAAAAACTAA GGAAACATTT TTCTTGTTTC 2160 

GAGTAGATAA TGCCAGCCTG TTAAACGCCG TCGACGAGTC TAACGGACAC CAACCAGCGA 2220 

ACCAGCAGCG TCGCGTCGGG CCAAGCGAAG CAGACGGCAC GGCATCTCTG TCGCTGCCTC 2280 

TGGACCCCTC TCGAGAGTTC CGCTCCACCG TTGGACTTGC TCCGCTGTCG GCATCCAGAA 2340 

ATTGCGTGGC GGAGCGGCAG ACGTGAGCCG GCACGGCAGG CGGCCTCCTC CTCCTCTCAC 24 00 

GGCACGGCAG CTACGGGGGA TTCCTTTCCC ACCGCTCCTA CTAGAGATAA TGAGCATTGC 24 60 
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ATGTCTAAGT TATAAAAAAT TACCACATAT TTTTTTTGTC ACACTTGTTT GAAGTGCAGT 2520 

TTATCTATCT TTATACATAT ATTTAAACTT TACTCTACGA ATAATATAAT CTATAGTACT 2580 

ACAATAATAT CAGTGTTTTA GAGAATCATA TAAATGAACA GTTAGACATG GTCTAAAGGA 2640 

CAATTGAGTA TTTTGACAAC AGGACTCTAC AGTTTTATCT TTTTAGTGTG CATGTGTTCT 2700 

CCTTTTTTTT TGCAAATAGC TTCACCTATA TAATACTTCA TCCATTTTAT TAGTACATCC 27 60 

ATTTAGGGTT TAGGGTTAAT GGTTTTTATA GACTAATTTT TTTAGTACAT CTATTTTATT 2820 

CTATTTTAGC CTCTAAATTA AGAAAACTAA AACTCTATTT TAGTTTTTTT ATTTAATAAT 2880 

TTAGATATAA AATAGAATAA AATAAAGTGA CTAAAAATTA AACAAATACC CTTTAAGAAA 294 0 

TTAAAAAAAC TAAGGAAACA TTTTTCTTGT TTCGAGTAGA TAATGCCAGC CTGTTAAACG 3000 

CCGTCGACGA GTCTAACGGA CACCAACCAG CGAACCAGCA GCGTCGCGTC GGGCCAAGCG 3060 

AAGCAGACGG CACGGCATCT CTGTCGCTGC CTCTGGACCC CTCTCGAGAG TTCCGCTCCA 3120 

CCGTTGGACT TGCTCCGCTG TCGGCATCCA GAAATTGCGT GGCGGAGCGG CAGACGTGAG 3180 

CCGGCACGGC AGGCGGCCTC CTCCTCCTCT CACGGCACGG CAGCTACGGG GGATTCCTTT 324 0 

CCCACCGCTC CTACTAGAAC TAGTGGATCC TATGCGTATG GTATGACGTG TGTTCAAGAT 3300 

GATGACTTCA AACCTACCTA TGACGTATGG TATGACGTGT GTCGACTGAT GACTTAGATC 3360 

CACTCGAGCG GCTATAAATA CGTACCTACG CACCCTGCGC TACCATCCCT AGAGCTGCAT 3420 

GCTTATTTTT ACA 34 33 
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